Skip to content

viona worker LWPs should not be CPU-pinned#1146

Merged
iximeow merged 4 commits into
masterfrom
viona-worker-pins
May 29, 2026
Merged

viona worker LWPs should not be CPU-pinned#1146
iximeow merged 4 commits into
masterfrom
viona-worker-pins

Conversation

@iximeow
Copy link
Copy Markdown
Member

@iximeow iximeow commented May 27, 2026

VNA_IOC_RING_INIT* includes, as part of init'ing a ring, creating an LWP in the calling process for that ring's worker thread. This is all fine and good, but for that LWP inheriting the binding of the original ioctl-caller. The ioctl-caller, then, is functionally always a vCPU LWP, in service of an MMIO exit by a guest who is trying to start their virtio NIC.

Since one vCPU is probably handling a routine that loops over all desired virtio rings and enables them, the effective outcome is that all viona rings are handled on some sole arbitrary host CPU that is also responsible for a vCPU.

Avoid all of this by temporarily unbinding whatever LWP is getting the viona ring going, then rebinding before we've continued on.

iximeow added 2 commits May 27, 2026 00:08
VNA_IOC_RING_INIT* includes, as part of init'ing a ring, creating an LWP
in the calling process for that ring's worker thread. This is all fine
and good, but for that LWP inheriting the binding of the original
ioctl-caller. The ioctl-caller, then, is functionally always a vCPU LWP,
in service of an MMIO exit by a guest who is trying to start their
virtio NIC.

Since one vCPU is probably handling a routine that loops over all
desired virtio rings and enables them, the effective outcome is
that all viona rings are handled on some sole arbitrary host CPU that is
also responsible for a vCPU.

Avoid all of this by temporarily unbinding whatever LWP is getting the
viona ring going, then rebinding before we've continued on.
@iximeow iximeow added bug Something that isn't working. networking Related to networking devices/backends. labels May 27, 2026
@iximeow
Copy link
Copy Markdown
Member Author

iximeow commented May 28, 2026

I haven't been able to get this on a racklette quite yet but I've tested the effect of this change by starting a large (68 vCPU) VM on dogfood, setting iperf3 -s up there, and then using a smaller (4-vcpu, not that it matters) VM elsewhere to send traffic to it on the VPC IP. status quo had the large VM receiving around 22gbit/sec. when I unbound the viona vring LWPs (incredibly scientific: for lwp in [list]; do pbind -u $lwp; done) the received throughput jumped to 26-31gbit/sec, stayed for a few seconds, and then fell to 22gbit/sec again.

restarting the iperf3 client saw a steady 31gbit/sec. I don't think any of this compares in an expected way with other benchmarks I know we've done, but I'm plenty convinced this is worth checking out in a more rigorous way..

Comment thread crates/pbind/src/lib.rs
Comment on lines +89 to +90
let newbind: processorid_t = bind_cpu.unwrap_or(PBIND_NONE);
let mut obind: processorid_t = PBIND_NONE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, what are the explicit processorid_t type annotations doing here? seems pretty clearly inferrable. is this just due to being sketched out by the cast in the unsafe block?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, in particular &mut obind as *mut processorid_t feels like an easy way to type confusion so I'm being a bit extra on the type annotations. and newbind got one because it felt weird to only type obind when they're the same.

Comment thread crates/pbind/src/lib.rs
Comment on lines +124 to +125
/// If the function panics, the LWP's original processor binding will not be
/// restored.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this could be pretty easily achieved with a drop guard, but maybe it's not worth it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since propolis-server/propolis-standalone are panic=abort i kinda don't care too much, but also even with a drop guard there's no guarantee that runs either right?

Comment on lines +1089 to +1091
// Arguably one might not want to do such operations directly on a
// vCPU thread. Device setup isn't exactly on anyone's hot path so
// we'll live.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, what are you imagining here? just spawning off a short-lived thread to do one ioctl, or...?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, or a separate thread to handle "control plane" operations on a device that these register reads/writes interact with by a channel. a lot of operations are allowed to be async, and we run then synchronously because it's simpler and decoupling the register access from taking a lock and bumping a field would be silly. but this is heavy enough you can see the thinking maybe?

@AlejandroME AlejandroME added this to the 20 milestone May 29, 2026
@iximeow
Copy link
Copy Markdown
Member Author

iximeow commented May 29, 2026

having checked out that with this the viona tx/rx LWPs aren't unexpectedly bound to some CPU after this change, I'm going to call this "clearly better than the status quo" - from here the only question to me is "is this worth mentioning in release notes once it's in", so ... lets get it in.

@iximeow iximeow merged commit 6873add into master May 29, 2026
14 checks passed
@iximeow iximeow deleted the viona-worker-pins branch May 29, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something that isn't working. networking Related to networking devices/backends.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants